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ABSTRACT 


Early evolutionary thinkers proposed relatively simple models to describe processes of evolution, and these are the basis of 
evolutionary models still used today. Recent research has since shown that evolutionary relationships among plants can be complex 
and difficult to reconstruct even from molecular data. In plants there is a continuum of processes, ranging from reticulate 
relationships within a sexually reproducing population, incomplete lineage sorting and hybridization between recently diverged 
species, allopolyploidy between more distantly related species, to symbioses and endosymbiosis. These aspects of plant biology can 
create practical problems for interpreting bifurcating gene trees and identifying species. The promise of “omics” is that it will 
provide data and analyses to improve our understanding of the nature of species and their phylogenetic relationships. We highlight 
the importance of distinguishing evolutionary processes and evolutionary models, and stress that improving the understanding of 


micro-evolutionary processes is necessary to inform current debate on whether or not to accept paraphyletic species. 
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Most researchers agree that practicality, usefulness, 
and predictability are important considerations for 
whether or not paraphyletic species should be 
recognized in taxonomy. However, while some authors 
suggest that biological classification should capture 
more evolutionary information (Hérandl & Stuessy, 
2010), others question whether the acceptance of 
paraphyletic species as units for classification better 
reflects processes of evolution and patterns of 
biological diversity (Schmidt-Lebuhn, 2011). A central 
question to resolve is: What can phylogenetic analyses 
tell us about the nature of species and evolutionary 
relationships? This question, and the debate over 
recognition of paraphyletic species, is rapidly being 
informed by genome science. Its promise is that it has 
the potential to tell us much about paraphyly, 
polyphyly, repeatability of evolution, and the spatial 
and temporal extent of interspecific gene flow. This 
information is important for understanding the nature 
of species and the suitability of gene tree and species 
tree methods for reconstructing evolutionary histories 
and developing robust taxonomies. 


THE NOTION OF SPECIES USED WHEN BUILDING SPECIES 
TREES 


It has been said that “all that is empirically 
demonstrable are species—they must exist because 


Evolution, genomics, hybridization, paraphyly, species concepts. 


we observe discontinuities between groups of organ- 
isms (Coyne & Orr, 2004)—and clades, which are the 
inevitable consequence of common ancestry and 
speciation processes (Wiley, 2009), and are recog- 
nizable from their synapomorphies” (Schmidt-Leb- 
uhn, 2011: 178). 

The idea of “species” as a fundamental unit 
certainly predates Darwin and Wallace. Earlier 
concepts were not of species but were “forms” and 
“kinds” of plants and animals that could even appear 
spontaneously (as mentioned in the King James 
Version of the Bible published in 1611). However, 
during the 17th century the concepts of “species” 
and “genera” developed, and as far as we can tell, 
John Ray, in his Natural Theology book of 1691, was 
one of the earliest writers to discuss species in a 
modern sense (Raven, 1986). As pointed out by Mayr 
(1968: 165), Ray, in discussing criteria that identify 
species, considered it essential that species “spring 
from the seed of one and the same ...” This is an 
idea in essence equivalent to the biological species 
concept that would obtain its popular form 250 years 
later. Ray gathered plants from various places in 
England and grew them all in his own garden, thus, 
separating inherited and environmental effects. He 
concluded that “varieties” could come from the same 
parent plant and were members of the same species. 
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These early concepts of having “members” of a set 
(species) within larger sets (genus) were also used by 
other authors at the time to describe different kinds of 
crystal structures (species) in rock. In doing so, there 
was no suggestion or implication that species, 
whether plants, animals, or rocks, could ever change 
or merge into other species over time. This way of 
thinking was formalized in the mid-18th century 
when Linnaeus (1736) codified a system that was 
universally adopted for classifying animals and 
plants, and which had as its basic unit the species. 
Much later, when championing the biological species 
concept, Mayr also emphasized the discrete and 
distinct nature of species—a consequence of indi- 
viduals from one species being reproductively 
isolated from individuals of another species (Mayr, 
1963; Donoghue, 1985). Although some botanists 
have abandoned the biological species concept when 
discussing plant diversity, it is Mayr’s way of thinking 
about species that has most influenced the develop- 
ment of modern methods for species delimitation 
(Fujita et al., 2012) and current methods for species 
tree reconstruction. An important question is whether 
such approaches are generally appropriate in plant 
systematics. 


ADOPTION OF A TREELIKE MODEL FOR EVOLUTION 


The developers of species tree methodology have 
taken some ideas from Darwin but not others. Darwin 
was convinced of descent by modification and at least 
partly by the treelike nature of evolutionary processes 
(Penny, 2011). He recognized morphological discon- 
tinuities as the result of the process of divergence and 
extinction (Mallet, 2008a). One could assume that it 
was a species tree that he had in mind when he first 
sketched a sticklike tree figure in his now famous 
notebook (see <http://darwin-online.org.uk/>), but it 
could also have been populations or varieties. It is not 
clear what he meant in terms of evolutionary 
relationships and the biological processes that were 
depicted. Both Darwin and Wallace had a notion for 
species trees and models for describing evolutionary 
relationships between species. In his 1855 paper, 
Wallace referred to “trees” and “evolution” in at 
least two places. In the first, he says “Much 
discussion has of late years taken place on the 
question, whether the succession of life upon the 
globe has been from a lower to a higher degree of 
organization? (p. 191)... and the progression from 
Fishes to Reptiles and Mammalia, and also from the 
lower mammals to the higher, is indisputable In the 
second passage that immediately follows, he says, 
“returning to the analogy of a branching tree, as best 
mode of representing the natural arrangement of 
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species and their successive creation, let us suppose 
that at an early geological epoch any group (say a 
class of Mollusca) has attained to great richness of 
species and high organization. Now let this great 
branch of allied species by geological mutations, be 
completely or partially destroyed. Subsequently a 
new branch springs from the same trunk, that is to 
say, new species are successively created, having 
their antitypes the same lower organised species 
which had served as the antitypes for the former 


group, 
conditions, which destroyed it. .. 


but which have survived the modified 
.” It is clear that 
Wallace had the idea of a phylogenetic tree (and 
Darwin noted that he “uses my simile of tree” on his 
copy of Wallace’s paper, probably annotated about 
1856-1857), even though Wallace had not fleshed 
out this concept in detail (Brooks, 1984). 

In On the Origin of Species, Darwin (1859) goes 
over much of the same ground as Wallace. One might 
think that it is a species tree, but he, too, does not 
detail the nature of this tree; the only tree figure in 
On the Origin of Species is one that Darwin largely 
moved from his unpublished “Big Book” (Stauffer, 
1975), where Darwin had another purpose in mind for 
this tree. There it was bound up with his somewhat 
undefined concept of “The Principle of Divergence.” 
Simply put, “The Principle of Divergence” is the 
outcome of the “Struggle for Existence” in terms of 
species (or as the full title to On the Origin of Species 
puts it “By Natural Selection or the Survival of 
Favoured Races in the Struggle for Life”). However, 
note that neither Wallace nor Darwin followed this 
through in a tree that mapped out how a species tree 
might look. Darwin’s (non-binary) tree as first 
envisaged in the “Big Book” showed how popula- 
tions, varieties, and species might change under 
competition and under the forces of extinction (not 
individually but over generations), and it was not 
concerned with a genealogy showing all species 
formations and losses. In On the Origin of Species, 
Darwin used much the same diagram to map how a 
group of related organisms (species or subspecies?) 
might change over time. However, in the diagram, 
Darwin did not define species. He was very careful 
not to do this, and he viewed the situation as being 
very plastic between races, species, and genera. 

Perhaps, ironically, it is Darwin’s treelike repre- 
sentation that is the evolutionary model adopted for 
reconstructing and visualizing evolutionary relation- 
ships, while at the same time he has been criticized 
for his concept of species (Mallet, 2008a). This 
duality also applies to recent species tree reconstruc- 
tion methods, which assume an evolutionary treelike 
model but do not model gene flow between species. 


BUILDING SPECIES TREES 


Recent years have seen exciting developments in 
methodology for reconstructing evolutionary relation- 
ships between species. In particular, based on the 
assumption that species comprise monophyletic 
assemblages of individuals, numerous new “multi- 
species coalescent” methods have been developed for 
inferring evolutionary relationships from nucleotide 
sequence data (Knowles, 2009; Kubatko et al., 2009; 
Liu et al., 2009; Heled & Drummond, 2010; Fan & 
Kubatko, 2011; Bryant et al., 2012). These methods 
assume a treelike model for evolution and accept that, 
while individual gene trees on occasion might 
indicate paraphyly, such gene trees have, neverthe- 
less, evolved on an underlying species tree in which 
each species is genetically isolated; thus, individuals 
of each new species are monophyletic with respect to 
individuals of other species. Species tree building 
methods do not test whether or not this is an 
appropriate model for describing the evolutionary 
relationships among the taxa being studied. 

Tests for inferring the evolutionary significance of 
observed patterns of monophyly among gene trees can 
be applied prior to species tree reconstruction and 
identification of reproductively isolated “species” 
based on minimum expectations for the extent of 
monophyly observed in analyses of independent 
genes (e.g., Rosenberg, 2007). Similarly, since the 
expected theoretical distribution of gene trees on a 
given species tree can be calculated (Degnan & 
Salter, 2005), then given a large enough sample of 
independent gene trees, in principle it is possible to 
examine whether there is a good fit between the 
optimal species tree (representing the evolutionary 
model) and the observed gene trees (Fan & Kubatko, 
2011). In practice, this is not done. However, as we 
discuss below, it is possible to test whether or not 
gene trees have evolved under evolutionary models 
involving hybridization. In this case, if statistical 
evidence rejects a simpler evolutionary scenario, it is 
more difficult to reconstruct multi-species coalescent 
species trees. 

Despite this concern, multi-species coalescent 
methodology has value for plant systematics because 
radiations have been important in the formation of 
many extant plant species (e.g., Martin et al., 2005; 
Winkworth et al., 2005; Linder, 2008; Pennington et 
al., 2010; Valente et al., 2010). Under such 
evolutionary scenarios, reconstructed gene trees are 
likely to contain short internal branches and long 
This 


phylogenetic accuracy can be low, particularly where 


external branches. is a situation in which 


there is substitution model misspecification (Hendy 


& Penny, 1989; Shavit et al., 2007). Furthermore, 
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even if accurately determined, gene trees can be 
discordant with the true underlying species tree. This 
is a theoretical expectation where ancestral popula- 
tion sizes of extinct species are relatively large and 
where times between diversification events are 
relatively small at the base of the species tree. 
Indeed, there are certain combinations of ancestral 
population sizes and divergence times where the 
majority of gene trees are expected to be incongruent 
with the species tree on which they have evolved 
(Degnan & Rosenberg, 2009). For these reasons, the 
idea of a methodology for reconstructing species 
relationships that accounts for the problem of 
discordant and paraphyletic gene trees due to 
incomplete lineage sorting is very appealing. Thus, 
the hope is for more reliable phylogenetic inferences 
from multi-species coalescent methods that could be 
used by biologists to inform taxonomy. 

The problem is that the multi-species coalescent 
models currently used for building species trees do 
not model gene flow between species, and for 
organisms that diverge with gene flow, it is unclear 
how effective the methodology is. In practice, too few 
empirical studies have been undertaken to yet know 
whether or not current implementations of multi- 
species coalescent tree methods provide a general 
and useful tool for plant systematics and taxonomy. 
One recent study examined six species of wild rice 
and noted extensive discordance of gene trees for 
these species. The authors reported more discordance 
than was expected under a coalescent model with no 
gene flow. Not surprisingly the authors found that a 
Bayesian species tree method (BEST) was unable to 
converge on the expected species tree topology 
(Cranston et al., 2009). Recent work has also used 
computer simulations to study the performance of the 
same BEST and the impact of horizontal gene transfer 
(as might occur through interspecific hybridization). 
In this case, it was found that when introgressed 
sequences were distributed asymmetrically between 
species (meaning a greater proportion of the genome 
of species A was present in species B than the 
genome of B was in species A), BEST also performed 
poorly (Chung & Ané, 2011). Nevertheless, this is 
still not very much information to go on, and the 
disappointing result might reflect more the coalescent 
model used rather than the method itself. There are 
coalescent models that do not assume symmetric 
patterns of gene flow (e.g., such as that developed by 
Beerli & Felsenstein, 2001), and while these are not 
yet implemented in building species trees, perhaps 
such models represent a useful direction for future 
research. 
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Although hybridization between some species has 
gained only recent acceptance in the zoological 
research community (e.g., see discussions by Mallet, 
2007, 2008b), almost 300 years of botanical 
investigation suggest its importance for understand- 
ing the nature of species and describing their 
relationships (e.g., Ehrendorfer, 1959; Stebbins, 
1959; Arnold, 1997; Rieseberg, 1997). Recent work 
recognizes the occurrence of hybridization in both 
animal and plant species radiations (Herder et al., 
2006; Mallet, 2007; Soltis & Soltis, 2009; Stemshorn 
et al., 2011) and in evolutionary adaptation to 
environmental change (Hoffman & Sgro, 2011). If 
hybridization is as common and evolutionarily 
significant as many researchers have now suggested, 
then multi-coalescent species tree methods in their 
current form might have limited applicability for 
reconstructing plant species relationships and in- 
forming debate on recognition of paraphyletic 
species. 


NEW ZEALAND ALPINE RANUNCULUS 


Hybridization is regarded as a conspicuous feature 
of the New Zealand flora (Cockayne & Allan, 1926). 
Within this flora, a group of 20 or so alpine 
Ranunculus species was described (Fisher, 1965; 
Webb et al., 1988; Heenan et al., 2006) as an 
adaptive radiation wherein convergent morphologies 
appeared in similar habitats across the New Zealand 
landscape. Phylogenetic analyses of nuclear ITS and 
chloroplast ycfl sequences from New Zealand alpine 
Ranunculus species reported in Lockhart et al. (2001) 
uncovered numerous examples of non-monophyletic 
relationships in reconstructed gene trees. Additional 
sequencing of the same loci has since identified 
further examples of non-monophyly. Figure 1 illus- 
trates relationships inferred for species representing 
one of the two main breeding groups. It has been 
unclear to us how such patterns of non-monophyly 
should be interpreted. Different species from the 
same geographic regions have nuclear ITS and 
chloroplast ycfl sequences more similar to those of 
other species from the same geographic locality than 
they do to members of their “own” species. Thus, the 
question for us is similar to that posed at the 
beginning of this article: What are the gene tree 
analyses telling us about the nature of New Zealand 
alpine Ranunculus species and their evolutionary 
relationships? 

We speculate that these species might be 
explained by Mallet’s (2007) genotypic cluster 
species concept. That is, while ecologically and 
morphologically significant traits and their underly- 
ing genetic determinants delimit these species, there 
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has, nevertheless, been regional gene flow of neutral 
genetic markers between sympatric reproductively 
compatible species. This interpretation would also be 
consistent with Fisher (1965) in that both interspe- 
cific hybridization and divergence are necessary to 
explain morphological variation within and between 
the natural populations of the species he described. 

We are currently collecting more genetic data to 
test this hypothesis using the analytical approaches 
described in the following section. If our hypothesis is 
correct, it will have implications for using multi- 
species coalescent tree reconstruction methods with 
our data. Most notably, if we were to include an 
increasing number of gene trees for neutral gene loci 
in a species tree reconstruction for New Zealand 
alpine Ranunculus given possibly high levels of 
regional interspecific gene flow, we would not expect 
reconciliation of discordant gene trees in a way that 
would represent a meaningful species tree. Given this 
possible concern, a first step in improving our 
understanding is to identify the occurrence and 
extent of past hybridization events involving alpine 
Ranunculus species. Until recently, and despite a 
general consensus for the importance of hybridization 
in plant evolution, quantifying the extent of natural 
hybridization has been a difficult issue to investigate 
in a rigorous manner (Arnold, 1997; Brumfield et al., 


2008; Joly et al., 2009). 


DISTINGUISHING HYBRIDIZATION FROM LINEAGE SORTING 


New methods based on coalescent models have 
recently been proposed for evaluating introgression 
(e.g., Joly et al., 2009; Pelser et al., 2010; Gerard et 
al., 2011; Joly, 2012). They might be applied after 
first noticing discordance between gene trees, para- 
phyly, or polyphyly in some reconstructed gene trees 
but not in others. In the first method, a species tree is 
inferred from independent gene trees whose genes in 
the taxa studied are assumed to be unaffected by 
hybridization. Gene trees are then simulated on this 
species tree (or posterior distribution for the species 
tree) assuming a coalescent model that allows for 
incomplete lineage sorting but not hybridization. The 
genetic distances between taxa in these simulated 
gene trees are then compared against genetic 
distances in the gene trees reconstructed for taxa 
and molecular markers which are being evaluated for 
evidence of introgression. Specifically, in instances 
where non-monophyly is observed in reconstructed 
gene trees, the question is asked whether genetic 
distances in the observed gene trees are significantly 
less than those expected in computer simulations 
under an assumed coalescent model. If so, then 
lineage sorting is excluded as an explanation for the 
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Figure 1. —A. New Zealand alpine Ranunculus species and microhabitats, photo R. acraeus: Anne Cartman, photo R. 
buchananii: Bruce van Brunt. —B. Geographic distributions according to Lockhart et al. (2001) and Heenan et al. (2006). —C, 
D. Accessions sequenced for C and D are numbered: 1, Borland; 2, Mt. Anglem; 3, Edwards; 4, Jagged Stream; 5, Mt. Franklin; 
6, Temple Basin; 7, Canyon Creek; 8, Franz Josef; 9, Hooker Valley; 10, Mt. Cook; 11, Mark Range; 12, Mt. Earnslaw; 13, 
French Ridge; 14, Lake Harris; 15, Mt. Tutoko; 16, Ocean Peak; 17, Mitre Peak; 18, Amuri; 19, Lake Tekapo; 20, Mt. Hutt; 21, 
Mt. St. Patrick; 22, Ben Ohau; 23, East Dome; 24, Eyre Mts.; 25, Symmetry Peaks; 26, Mt. Pisgah; 27, St. Mary; 28, Homer; 29, 
Lake Alta; 30, Mt. Brewster; 31, Mt. Burns; 32, Mt. Earnslaw; 33, Skeleton Lake; 34, Takitimu Mts. —C. Neighbor joining tree 
of chloroplast ycfl. —D Neighbor joining tree of nr ITS sequences. In Lockhart et al. (2001), the ycfl region was referred to as 
JSA—the region where ycfl is located in the chloroplast genome. Trees were constructed with Geneious version 5.6 (<http:// 
www.geneious.com/>). 


non-monophyly, and hybridization is inferred. If the 
distances are not significantly less, then incomplete 
lineage sorting remains a possible explanation for the 
data. The approach suggested by Pelser et al. (2010) 
is similar, but these authors invert the argument. 
Certain genetic distances between different species 


are only expected given certain ancestral population 


sizes. If estimates of these population sizes are far 
greater than is reasonable, then incomplete lineage 
sorting can also be rejected. In Meng and Kubatko 
(2009) and Gerard et al. (2011), an estimate of the 
extent of introgression in an a priori specified taxon is 
made by comparing observed gene trees with trees 


simulated on the parental trees (or “principal” trees 
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as they have been called in Holland et al., 2008) 
embedded within a rooted reticulate phylogenetic 
network. 

All of the above methods require that the reference 
species tree (or posterior distribution of species trees: 
Joly, 2012) is reliably inferred and not itself impacted 
by hybridization. While this is a limitation (and 
potentially can introduce circularity), in practice it 
means that implementations of the test need to be 
conservative, and simulations might not be able to be 
conducted using all species belonging to a group. 
When the test of Joly et al. (2009) was proposed, we 
illustrated the test using a data set similar to that 
shown in Figure 1. However, at P = 0.05 level we 
could not reject lineage sorting as an explanation for 
the observed patterns of paraphyly. We felt this might 
be the result of the relatively short concatenated 
chloroplast sequences used in the analyses. We have 
now sequenced complete chloroplast genomes for 
some of our alpine Ranunculus species (unpublished) 
and also made progress in developing EST libraries 
for alpine Ranunculus species using Illumina 
sequencing protocols (Atherton et al., 2010; Gruen- 
heit et al., 2012). However, the analyses from this 
further work are not complete at the time of writing. 
The hope is that, given these additional data, we will 
soon be in a better position to understand the 
meaning of the paraphyletic gene trees observed in 
phylogenetic reconstructions of our species (Fig. 1). 
That is, we look to these data to help us determine 
what gene trees are telling us about the nature of our 
species and their evolutionary relationships. 


GENOME SCIENCE AND THE NATURE OF SPECIES 


Recent studies have provided an overwhelming 
number of species concepts (see <http://www.ucl.ac. 
uk/taxome/jim/Sp/species.pdf>) and a sense that 
authors are often referring to different entities when 
discussing species in their favorite group of organ- 
isms. Introgression and horizontal gene transfer 
between species have evidently occurred in many 
groups of organisms—plants, animals, and microbes 
(Dagan et al., 2008; Mallet, 2008b). These inferences 
have been made as a result of the increasing number 
of genetic markers that have become available for 
non-model organisms. With methods such as those 
described above, the power for detection of hybrids 
increases greatly (Joly et al., 2009; Gerard et al., 
2011). It is exciting to think that very soon, with high- 
throughput sequencing of genomes and transcrip- 
tomes, we will have our best glimpse yet of the 
genetic distinctiveness of species, their evolutionary 
transience, and complexity of relationships. 
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CONCLUSIONS 


The significance of paraphyletic taxa in our 
phylogenetic reconstructions will become clear once 
we have a better understanding of the nature of 
species and their evolutionary relationship with other 
species. lf interspecific gene flow is limited, para- 
phyletic gene trees are neither unexpected nor 
problematic for species tree reconstruction methods 
(Schmidt-Lebuhn, 2011). However, this is not the 
case if hybridization is more pervasive. The concept 
of “species” used in species tree methods is one that 
fits uncomfortably with the suggested complex nature 
of many plant species, including Ranunculus (Hör- 
andl & Emadzade, 2012). In such cases, there 
remains uncertainty over the interpretation of para- 
phyletic gene trees and whether or not discordant 
gene trees indicate complex relationships among 
species or incomplete lineage sorting characteristic of 
species radiations (Degnan & Rosenberg, 2009). 
Analytical approaches such as those outlined above 
provide new tools to better evaluate these trees as 
well as the extent and importance of hybridization in 
nature. If such study shows that hybridization is as 
significant as some authors have suggested, then 
there will be increased impetus to develop more 
appropriate coalescent models for species trees 
methodology (as already suggested: Brumfield et al., 
2008). Alternatively, such a finding might stimulate 
the further development of approaches for recon- 
structing species relationships that are model free 
and based on criteria such as data partition 
concordance (e.g., see Larget et al., 2010) and/or 
those that get away entirely from Darwin’s sticklike 
figure and consider heuristic solutions to the problem 
of reconstructing reticulate hybridization networks 
(e.g., Huson et al., 2005; as used in Pirie et al., 
2009). Ultimately, the aim will be to better 
understand and describe the plant biodiversity in 
front of us, and progress will only come with 
improved understanding of the genetic data at hand. 
Developing this understanding will be an important 
contribution to the debate over whether or not to 
recognize paraphyletic species. 
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